Inserting replay events in network production flows

ABSTRACT

An example method includes detecting a first replay event in a production flow being processed by a service assurance platform of a telecommunications service provider network, where the production flow comprises a stream of production events and a stream of replay events including at least the first replay event, processing the first replay event to generate a ticket for the first replay event, and delivering the ticket to a database that stores a plurality of tickets generated for a plurality of replay events.

The present disclosure relates generally to the testing and validationof complex event flows, and relates more particularly to devices,non-transitory computer-readable media, and methods for inserting replayevents into production flows.

BACKGROUND

Service assurance (SA) is a term used to describe the application ofpolicies and processes by a telecommunications service provider toensure that services offered to customers over a telecommunicationsservice provider network meet a pre-defined service quality level. Oneaspect of SA involves issue tracking, or detecting and resolving eventsin the network that are indicative of the service failing to meet thepre-defined service quality level or service degrading. When such anevent is detected, a ticketing system of an SA platform may generate aticket for review by a technician. The ticket may include informationabout the detected event that can help the technician to diagnose thecause of the event and to perform some remedial action.

BRIEF DESCRIPTION OF THE DRAWINGS

The teachings of the present disclosure can be readily understood byconsidering the following detailed description in conjunction with theaccompanying drawings, in which:

FIG. 1 illustrates an example system in which examples of the presentdisclosure for inserting replay events into production flows mayoperate;

FIG. 2 illustrates one particular example of the application server ofFIG. 1 in further detail;

FIG. 3 illustrates a flowchart of an example method for inserting replayevents into production flows, in accordance with the present disclosure;

FIG. 4 illustrates a flowchart of an example method for inserting replayevents into production flows, in accordance with the present disclosure;

FIG. 5 illustrates a flowchart of an example method for processingreplay events that have been inserted into production flows, inaccordance with the present disclosure; and

FIG. 6 depicts a high-level block diagram of a computing device orprocessing system specifically programmed to perform the functionsdescribed herein.

To facilitate understanding, similar reference numerals have been used,where possible, to designate elements that are common to the figures.

DETAILED DESCRIPTION

The present disclosure broadly discloses methods, computer-readablemedia, and systems for inserting replay events into production flows. Inone example, a method performed by a processing system includesdetecting a first replay event in a production flow being processed by aservice assurance platform of a telecommunications service providernetwork, where the production flow comprises a stream of productionevents and a stream of replay events including at least the first replayevent, processing the first replay event to generate a ticket for thefirst replay event, and delivering the ticket to a database that storesa plurality of tickets generated for a plurality of replay events.

In another example, a non-transitory computer-readable medium may storeinstructions which, when executed by a processing system in acommunications network, cause the processing system to performoperations. The operations may include detecting a first replay event ina production flow being processed by a service assurance platform of atelecommunications service provider network, where the production flowcomprises a stream of production events and a stream of replay eventsincluding at least the first replay event, processing the first replayevent to generate a ticket for the first replay event, and deliveringthe ticket to a database that stores a plurality of tickets generatedfor a plurality of replay events.

In another example, a system may include a processing system includingat least one processor and a non-transitory computer-readable mediumstoring instructions which, when executed by the processing system whendeployed in a communications network, cause the processing system toperform operations. The operations may include detecting a first replayevent in a production flow being processed by a service assuranceplatform of a telecommunications service provider network, where theproduction flow comprises a stream of production events and a stream ofreplay events including at least the first replay event, processing thefirst replay event to generate a ticket for the first replay event, anddelivering the ticket to a database that stores a plurality of ticketsgenerated for a plurality of replay events.

As discussed above, one aspect of service assurance (SA) involves issuetracking, or detecting and resolving events in a telecommunicationsservice provider network that are indicative of a service failing tomeet a pre-defined service quality level. When such an event isdetected, a ticketing system of an SA platform may generate a ticket forreview by a technician. The testing and validation of complex eventflows in systems such as a ticketing system, or any other systems of anSA or similar platform, often involve “replaying” events into the systemwhen testing a new or modified policy. For instance, conventionaltesting and validation techniques may create a second flow (e.g., a“replay flow”), separate from the production flow (where the productionflow comprises a real time flow of new events), in which replay eventscan be inserted and policies can be modified to measure the effects ofthe modified policies without disrupting the ongoing production flow.

“Replay events,” within the context of the present disclosure, arehistorical production events that have been observed and captured by thesystem in a raw or original format, such that when a replay event isinserted back into an event flow, the system treats the replay event asif the replay event is an autonomous event, except for one difference.The difference is that the replay event contains metadata (e.g., in theform of a replay header or flag) that contains some attributes used bythe replay functionality to track the replay event as it traverses theSA platform.

Examples of the present disclosure provide the option to insert replayevents into either a dedicated replay flow or a production flow forend-to-end testing and validation of an SA platform system. Theproduction flow is connected to downstream systems in the productionenvironment (e.g., the production environment may be an SA platform, andthe downstream systems may include a ticketing system). Replay eventscan be inserted into the production flow or the replay flow at variouspoints in the production flow and the replay flow. As discussed above,when the replay events are inserted into the production flow, metadataassociated with the replay events is used to ensure proper handling ofthe replay events by components of the SA platform.

Key attributes contained in the metadata of a replay event may include,but are not limited to, attributes such as a unique identifier thatidentifies a specific job associated with the replay event, a time stampindicating a time at which the original production event (of which thereplay event is a duplicate) was observed in the production flow, and aflow indicator identifying an event flow to which the replay eventbelongs (e.g., a production flow or a replay flow). The flow indicatoris typically used when inserting a replay event into a production flow.Since replay events can be inserted into both the production flow andthe replay flow (depending on replay entry point), the flow indicator isleveraged by both the production flow and the replay flow such that,when a replay event is replayed into the replay flow, the productionflow drops or ignores the replay event (and vice versa). These and otheraspects of the present disclosure are discussed in greater detail belowin connection with the examples of FIGS. 1-6.

Although examples of the present disclosure are described within thecontext of a service assurance platform, it will be appreciated that thetechniques described herein could be used to test and validate policiesfor any type of platform or system in which it may be helpful to replayhistorical events as part of the testing and validation process.

To further aid in understanding the present disclosure, FIG. 1illustrates an example system 100 in which examples of the presentdisclosure for inserting replay events into production flows mayoperate. The system 100 may include any one or more types ofcommunication networks, such as a traditional circuit switched network(e.g., a public switched telephone network (PSTN)) or a packet networksuch as an Internet Protocol (IP) network (e.g., an IP MultimediaSubsystem (IMS) network), an asynchronous transfer mode (ATM) network, awired network, a wireless network, and/or a cellular network (e.g.,2G-5G, a long term evolution (LTE) network, and the like) related to thecurrent disclosure. It should be noted that an IP network is broadlydefined as a network that uses Internet Protocol to exchange datapackets. Additional example IP networks include Voice over IP (VoIP)networks, Service over IP (SoIP) networks, the World Wide Web, and thelike.

In one example, the system 100 may comprise a core network 102. The corenetwork 102 may be in communication with one or more access networks 120and 122, and with the Internet 124. In one example, the core network 102may functionally comprise a fixed mobile convergence (FMC) network,e.g., an IP Multimedia Subsystem (IMS) network. In addition, the corenetwork 102 may functionally comprise a telephony network, e.g., anInternet Protocol/Multi-Protocol Label Switching (IP/MPLS) backbonenetwork utilizing Session Initiation Protocol (SIP) for circuit-switchedand Voice over Internet Protocol (VoIP) telephony services. In oneexample, the core network 102 may include at least one applicationserver (AS) 104, at least one database (DB) 106, a production eventscollector (or, simply, “collector”) 116, a replay dispatcher 118, and aplurality of edge routers 128-130. For ease of illustration, variousadditional elements of the core network 102 are omitted from FIG. 1.

In one example, the access networks 120 and 122 may comprise DigitalSubscriber Line (DSL) networks, public switched telephone network (PSTN)access networks, broadband cable access networks, Local Area Networks(LANs), wireless access networks (e.g., an IEEE 802.11/Wi-Fi network andthe like), cellular access networks, 3^(rd) party networks, and thelike. For example, the operator of the core network 102 may provide acable television service, an IPTV service, or any other types oftelecommunication services to subscribers via access networks 120 and122. In one example, the access networks 120 and 122 may comprisedifferent types of access networks, may comprise the same type of accessnetwork, or some access networks may be the same type of access networkand other may be different types of access networks. In one example, thecore network 102 may be operated by a telecommunication network serviceprovider. The core network 102 and the access networks 120 and 122 maybe operated by different service providers, the same service provider ora combination thereof, or the access networks 120 and/or 122 may beoperated by entities having core businesses that are not related totelecommunications services, e.g., corporate, governmental, oreducational institution LANs, and the like.

In one example, the access network 120 may be in communication with oneor more user endpoint devices 108 and 110. Similarly, the access network122 may be in communication with one or more user endpoint devices 112and 114. The access networks 120 and 122 may transmit and receivecommunications between the user endpoint devices 108, 110, 112, and 114,between the user endpoint devices 108, 110, 112, and 114, the server(s)126, the AS 104, other components of the core network 102, devicesreachable via the Internet in general, and so forth.

In one example, each of the user endpoint devices 108, 110, 112, and 114may comprise any single device or combination of devices that maycomprise a user endpoint device. For example, the user endpoint devices108, 110, 112, and 114 may each comprise a mobile device, a cellularsmart phone, a gaming console, a set top box, a laptop computer, atablet computer, a desktop computer, an Internet of Things (IoT) device,a wearable smart device (e.g., a smart watch, a fitness tracker, a headmounted display, or Internet-connected glasses), an application server,a bank or cluster of such devices, and the like. To this end, the userendpoint devices 108, 110, 112, and 114 may comprise one or morephysical devices, e.g., one or more computing systems or servers, suchas computing system 600 depicted in FIG. 6, and may be configured asdescribed below. At least one of the user endpoint devices 108, 110,112, and 114 may comprise a computing device operated by a humantechnician or a policy writer of an SA platform of thetelecommunications service provider network, where the at least one userendpoint device may be used to control testing and validation ofpolicies associated with the SA platform as discussed in further detailbelow.

In one example, one or more servers 126 may be accessible to userendpoint devices 108, 110, 112, and 114 via Internet 124 in general. Theserver(s) 126 may operate in a manner similar to the AS 104, which isdescribed in further detail below. Alternatively, the servers maycomprise application servers accessible by the user endpoint devices108, 110, 112, and 114 that provide one or more services (e.g., mediastreaming servers, web servers, etc.).

In one example, the collector 116, the replay dispatcher 118, and the AS104 may collectively comprise a system that is provided by an operatorof the core network 102 (e.g., a telecommunication service provider) aspart of a service assurance platform. For instance, in one example, theAS 104 may comprise an issue tracking system that processes productionevents collected by the collector 116 and replay events provided by thereplay dispatcher 118.

More specifically, the collector 116 may detect and collect informationabout events that occur in the system 100 that may be indicative of adegradation in the service provided to user endpoint devices 108, 110,112, and 114. These fault/performance events may take the form of simplenetwork management protocol (SNMP) events, system logging protocol(syslog) events, polling events, streaming telemetry events, and othertypes of fault events. In one example, the events collected by thecollector 116 may be collected in one or more machine-readable formsthat are not readable by a human technician (i.e., not readable withoutsome further processing). In one example, events collected by thecollector 116 may also be referred to herein as “production events.” Aproduction event is an original event that occurs in a productionenvironment (as opposed to a “replay event,” which is discussed ingreater detail below).

In one example, an SNMP event may comprise a fault or performance event(also referred to as a “trap”) sent by a managed device (e.g., one ofthe user endpoint devices 108, 110, 112, and 114 or a network devicesuch as edge routers 128 and 130, servers 126, and the like) when achange-of-state (COS) event occurs. The COS event may comprise, forexample, a power outage, a security breach, a simple status event (e.g.,a door opening and closing, etc.), and other events indicating achange-of-status. SNMP events may be sent by the managed devicesaccording to a regular schedule (e.g., updates sent every x minutes) orautomatically in response to a COS event being detected.

In one example, a syslog event may comprise a message generated by anetwork device, such as a router or switch (e.g., edge routers 128 and130, servers 126, etc.), to record an event observed by the networkdevice. For instance, a syslog event associated with a router mightcomprise a user logging on to a console session, while a syslog eventassociated with a web server might comprise an access-denied event.

In one example, a polling event may be detected by a poller upon afailure of a network device, such as a router, a switch, or a server(e.g., edge routers 128 and 130, servers 126, etc.) or network interfacecards or protocols (e.g., border gateway protocol) configured on thosenetwork devices, to respond to a polling message. In this case, thepolling message may comprise a message sent to the network device toconfirm that the network device's readiness or state. The failure torespond may indicate a network device failure or a card or interfaceprotocol failure on the network device.

In one example, a streaming telemetry event may comprise status data(e.g., up or down) that is reported by a network device, such as arouter or switch (e.g., edge routers 128 and 130, servers 126, etc.) orother devices in an automatic continuous manner (i.e., without the needfor polling). For instance, a streaming telemetry event might comprise areport of real time packet drops or utilization on network links. It isnoted that SNMP, syslog, and streaming telemetry type events may allreport the same of similar types of failures. Which type of eventreports a failure is dependent upon vendor implementations. Forinstance, a device vendor may configure certain types of failures (e.g.,device, card, interface, protocol, environmental failure, etc.) to bereported using certain types of events (e.g., SNMP, syslog, streamingtelemetry, etc.).

In one example, the collector 116 stores production events in one ormore partitions or virtual groups referred to herein as “topics,” wherea topic carries a specific type of event (e.g., a simple networkmanagement protocol event, a system logging protocol event, a pollingevent, or the like, which may be further partitioned into sub-topics).The use of topics allows the collector 116 to store the productionevents in an ordered fashion (e.g., the collector 116 may appendproduction events one after another and create a log file). The use oftopics may also facilitate handling of the production events by one ormore downstream devices or systems (e.g., components of the AS 104). Itshould be noted that multiple different collectors may be deployed tocollect different types of events (e.g., SNMP, syslog, polling,streaming telemetry, etc.). Thus, in one example, the collector 116 mayrepresent multiple collectors.

In one example, the replay dispatcher 118 may duplicate productionevents collected by the collector 116 to produce replay events. Asdiscussed above, a “replay event,” as understood within the context ofthe present disclosure, comprises a duplicate of a production event thatadditionally includes some metadata (e.g., a header or a flag set in afield of the header) to indicate that it is a replay event rather than anewly detected production event.

In one example, the collector 116 may forward production events to theAS 104 to be inserted into a stream of production events 132. The streamof production events 132 may comprise production events collected by thecollector 116 for processing by the AS 104 (e.g., by various componentsof the AS 104 which may provide ticketing-related functions).Alternatively, the replay dispatcher 118 may forward replay events tothe AS 104 in a dedicated stream of replay events 134. The stream ofreplay events 134 may exclusively comprise replay events generated bythe replay dispatcher 118.

In accordance with the present disclosure, the AS 104 and DB 106 may beconfigured to provide one or more operations or functions in connectionwith examples of the present disclosure for inserting replay events intoproduction flows. For instance, the AS 104 may comprise a plurality ofcomponents that provide ticketing-related functions for the productionevents and the replay events.

To this end, the AS 104 may comprise one or more physical devices, e.g.,one or more computing systems or servers, such as computing system 600depicted in FIG. 6, and may be configured as described below. It shouldbe noted that as used herein, the terms “configure,” and “reconfigure”may refer to programming or loading a processing system withcomputer-readable/computer-executable instructions, code, and/orprograms, e.g., in a distributed or non-distributed memory, which whenexecuted by a processor, or processors, of the processing system withina same device or within distributed devices, may cause the processingsystem to perform various functions. Such terms may also encompassproviding variables, data values, tables, objects, or other datastructures or the like which may cause a processing system executingcomputer-readable instructions, code, and/or programs to functiondifferently depending upon the values of the variables or other datastructures that are provided. As referred to herein a “processingsystem” may comprise a computing device including one or moreprocessors, or cores (e.g., as illustrated in FIG. 6 and discussedbelow) or multiple computing devices collectively configured to performvarious steps, functions, and/or operations in accordance with thepresent disclosure.

One particular example of the AS 104 of FIG. 1 is illustrated in furtherdetail in FIG. 2. As illustrated in FIG. 2, the AS 104 may comprise aplurality of components, including a pre-processing component 202, apost-processing component 204, a ticketing component 206 (which performsticket creation from post-processed alarm events and ticketcorrelation), and a ticket management component 208 (which performsticket enrichment, ticket storage, ticket graphical user interface (GUI)functions, and the like). Each of the pre-processing component 202, thepost-processing component 204, and the ticketing component 206 maycomprise at least two sub-components for processing different eventflows. For instance, the pre-processing component 202 may comprisepre-processing component 202 a and pre-processing component 202 b; thepost-processing component 204 may comprise post-processing component 204a and post-processing component 204 b; and the ticketing component 206may comprise ticketing component 206 a and ticketing component 206 b.

The pre-processing component 202 a, post-processing component 204 a,ticketing component 206 a, and ticket management component 208 maycomprise a first processing path 210 for processing production events,while the pre-processing component 202 b, post-processing component 204b, ticketing component 206 b, may comprise a second processing path 212for processing replay events. In one example, components along the firstprocessing path 210 are not configured to process replay events. Asdiscussed above, components may detect when an event is a replay eventby analyzing the flow type identifier in the metadata of the replayevent.

Each of the components, i.e., pre-processing component 202 a,pre-processing component 202 b, post-processing component 204 a,post-processing component 204 b, ticketing component 206 a, ticketingcomponent 206 b, and ticket management component 208 may subscribe toone or more topics, where the topics are used as described above togroup events. By subscribing to specific topics, the components maycontrol the types of events that the components receive and process.

In one example, replay events (either in the stream of replay events 134or inserted into the stream of production events 132) may be delivereddirectly to the pre-processing component 202 b. FIG. 2 illustrates anexample in which the replay events are inserted into the stream ofproduction events 132 to produce a production flow combined withinserted replay events, indicated by arrow 214. As illustrated, eventsin the production flow combined with inserted replay events 214 may bedirected to the appropriate processing path (e.g., first processing path210 or second processing path 212) when the production flow combinedwith inserted replay events 214 reaches the pre-processing component202. However, if the replay events are part of a dedicated replay stream(rather than being inserted into the production stream), the operationsperformed by the components of the second processing path 212 on thereplay events would be the same.

Thus, the pre-processing component 202 b may perform one or morepre-processing operations on the replay events. These pre-processingoperations may reformat the replay events so that the replay events canbe read by a human technician. For instance, as discussed above,production events (and the replay events that are generated byduplicating the production events) may be generated in amachine-readable form that can be understood by a computer, but not by ahuman technician. Thus, reformatting of the replay events may be neededto facilitate analysis by a human technician. In one example thepre-processing operations performed by the pre-processing component 202b may include virtual network function event stream (VES) normalization.

Once pre-processing has been performed by the pre-processing component202 b, the replay events may pass along the second processing path 212to the post-processing component 204 b. The post-processing component204 b may perform one or more post-processing operations on the replayevents, such as alarm processing rules including smoothing (alsoreferred to as aging), de-duplication, suppression, chronic/flapping,alarm correlation, and the like. These post-processing operations maycreate, for each reformatted replay event on the second processing path212, an alarm based on the reformatted replay event. The alarm maycomprise a notification to alert a human technician to the occurrence ofthe replay event. The post-processing component 204 b may correlatealarms that are redundant or that are related to the same root cause.For instance, an alarm relating to a failed router and an alarm relatingto a failed port of the router may be correlated, as failure of the portis likely to be caused by failure of the router (and, hence, restoringthe router will likely restore the port).

Once post-processing has been performed by the post-processing component204 b, the second processing path 212 may pass the alarmed replay eventsto the ticketing component 206 b. The ticketing component 206 b maycreate, for each alarmed replay event on the second processing path 212,a ticket. The ticket may comprise a tracking tool that containsinformation about the alarmed replay event. The ticket may be used, forexample, to prioritize alarmed replay events, organize alarmed replayevents into a queue, correlate alarmed replay events, and the like. Theticketing component 206 b may also correlate tickets which are redundantor tickets which originate from two or more independent domains (e.g., a5G fault that is caused by a wireline fault alarm). Correlation oftickets ensures that the limited resources of the human technicians aredeployed in the most optimal manner (e.g., the technicians do not wastetime trying to resolve redundant tickets when other, potentially morepressing tickets may be queued).

Once the replay events have been reformatted, alarmed, and ticketed, thesecond processing path 212 may output the tickets to a database (e.g.,DB 106 of FIG. 1). It should be noted that in one example, thepre-processing component 202 b, and/or the post-processing component 204b may preserve the metadata (e.g., headers) of the replay events untiltickets for the replay events are generated by the ticketing component206 b and then forwarded to the ticket management component 208 (whichmay perform operations including ticket enrichment, ticket storage,ticket GUI operations, and the like).

Operations performed on the production events in the first processingpath 210 are similar to those described above. For instance, thepre-processing component 204 a may perform one or more pre-processingoperations on the production events in the first processing path 210.These pre-processing operations may reformat the production events asdiscussed above so that the production events can be read by a humantechnician.

Once pre-processing has been performed by the pre-processing component202 a, the first processing path 210 may pass the production events tothe post-processing component 204 a. The post-processing component 204 amay create, for each reformatted production event in the firstprocessing path 210, an alarm based on the reformatted production eventas described above. The post-processing component 204 a may alsocorrelate alarms.

Once post-processing has been performed by the post-processing component204 a, the first processing path 210 may pass the production events tothe ticketing component 206 a. The ticketing component 206 a may create,for each alarmed production event in the first processing path 210, aticket as described above. The ticketing component 206 a may alsoperform ticket correlation as described above.

Once the production events have been reformatted, alarmed, and ticketed,the production flow on the first processing path 210 may deliver theproduction events to the ticket management component 208.

As illustrated in FIG. 2, the production flow combined with insertedreplay events 214 may be inserted into the first processing path 210and/or the second processing path 212 at multiple points. For instance,the production flow combined with inserted replay events 214 may beinserted directly before the pre-processing component 202, directlybefore the post-processing component 204, or directly before theticketing component 206. Thus, replaying of replay events may skip oneor more components of the AS 104. Thus, replay events may be insertedanywhere in the end-to-end flow, in which case the replay events maymimic the output of earlier components in the first processing path 210or the second processing path 212.

Referring back to FIG. 1, the DB 106, as described above, may storetickets associated with processed events of the first processing path210 and/or the second processing path 212. In one example, DB 106 maycomprise a physical storage device integrated with the AS 104 (e.g., adatabase server or a file server), or attached or coupled to the AS 104,in accordance with the present disclosure. In one example, the AS 104may load instructions into a memory, or one or more distributed memoryunits, and execute the instructions for processing replay eventsdetected in production flows, as described herein. An example method forprocessing replay events that have been inserted into production flowsis described in greater detail below in connection with FIG. 5.

It should be noted that the system 100 has been simplified. Thus, thoseskilled in the art will realize that the system 100 may be implementedin a different form than that which is illustrated in FIG. 1, or may beexpanded by including additional endpoint devices, access networks,network elements, application servers, etc. without altering the scopeof the present disclosure. In addition, system 100 may be altered toomit various elements, substitute elements for devices that perform thesame or similar functions, combine elements that are illustrated asseparate devices, and/or implement network elements as functions thatare spread across several devices that operate collectively as therespective network elements. For example, the system 100 may includeother network elements (not shown) such as border elements, routers,switches, policy servers, security devices, gateways, a contentdistribution network (CDN) and the like. For example, portions of thecore network 102, access networks 120 and 122, and/or Internet 124 maycomprise a content distribution network (CDN) having ingest servers,edge servers, and the like. Similarly, although only two accessnetworks, 120 and 122 are shown, in other examples, access networks 120and/or 122 may each comprise a plurality of different access networksthat may interface with the core network 102 independently or in achained manner. For example, UE devices 108, 110, 112, and 114 maycommunicate with the core network 102 via different access networks,user endpoint devices 110 and 112 may communicate with the core network102 via different access networks, and so forth. Thus, these and othermodifications are all contemplated within the scope of the presentdisclosure.

FIG. 3 illustrates a flowchart of an example method 300 for insertingreplay events into production flows, in accordance with the presentdisclosure. In one example, steps, functions and/or operations of themethod 300 may be performed by a device as illustrated in FIG. 1, e.g.,replay dispatcher (e.g., a replay dispatcher 118 of FIG. 1). In oneexample, the steps, functions, or operations of method 300 may beperformed by a computing device or system 600, and/or a processingsystem 602 as described in connection with FIG. 6 below. For instance,the computing device 600 may represent at least a portion of the replaydispatcher 118 in accordance with the present disclosure. Forillustrative purposes, the method 300 is described in greater detailbelow in connection with an example performed by a processing system,such as processing system 602.

The method 300 begins in step 302 and proceeds to step 304. In step 304,the processing system may receive a signal requesting that a stream ofreplay events including a first replay event be inserted into aproduction flow of a service assurance platform of a telecommunicationsservice provider network. As discussed above, in one example, theproduction flow comprises a stream of production events, i.e., real-timeoriginal (i.e., not replay) events observed by devices within thetelecommunications service provider network and collected at acollection point (e.g., such as collector 116 of FIG. 1). The stream ofreplay events, on the other hand, contains duplicates of historical (orpreviously observed) production events, which are referred to as “replayevents,” and which are to be replayed into the service assurance system(either in the production flow according to the request received in step304 or in a separate, dedicated replay flow). The signal may be receivedin the form of a command line, by a user selection via a graphical userinterface (GUI), or by other means.

The processing system may annotate the replay events with metadata toidentify the events as replay events rather than production events,which facilitates proper downstream handling by components of theservice assurance platform. In one example, the metadata may include atleast an identifier that uniquely identifies the replay event and atimestamp that identifies a time at which the replay event wasoriginally observed to occur as a production event. The metadata mayalso include a flow type identifier that identifies the flow type of thereplay event (e.g., as either a production flow or a replay flow).

The metadata may be contained in a dedicated header to indicate that thereplay event is a replay event (e.g., coming from a replay dispatcherrather than a collector). The use of a dedicated header may facilitatethe reuse of existing collector (e.g., production) topics, e.g., so thatnew replay topics do not need to be created for each replay flow to bereplayed into the service assurance system. The ability to reuseexisting production topics for replay events, rather than create newreplay topics for the replay events, may simplify maintenance operationsby reducing the total number of topics to be managed by the serviceassurance system.

In another example, the metadata may include a flag set in aconventional header to indicate that the replay event is a replay event(e.g., setting a value of the header to zero (0) may indicate aproduction event, while setting the value to one (1) may indicate areplay event).

In step 306, the processing system may insert the stream of replayevents into the production flow that is headed for the service assuranceplatform, in response to the signal. In one example, the replay eventsin the stream of replay events maintain the metadata that identities theevents as replay events when the replay events are inserted into theproduction flow. The stream of replay events may be inserted into theproduction flow at any one of multiple points in the production flow(e.g., directly before any of the pre-processing component 202,post-processing component 204, and ticketing component 206 describedabove).

The method 300 may end in step 308.

FIG. 4 illustrates a flowchart of an example method 400 for insertingreplay events into production flows, in accordance with the presentdisclosure. In one example, steps, functions and/or operations of themethod 400 may be performed by a device as illustrated in FIG. 1, e.g.,a control device (e.g., a user endpoint device operated by a humantechnician, such as one of the user endpoint devices 108, 110, 112, and114 of FIG. 1, or by a replay dispatcher such as replay dispatcher 118of FIG. 1 which may include a GUI). In one example, the steps,functions, or operations of method 400 may be performed by a computingdevice or system 600, and/or a processing system 602 as described inconnection with FIG. 6 below. For instance, the computing device 600 mayrepresent at least a portion of the user endpoint device 108, 110, 112,or 114 in accordance with the present disclosure. For illustrativepurposes, the method 400 is described in greater detail below inconnection with an example performed by a processing system, such asprocessing system 602.

The method 400 begins in step 402 and proceeds to step 404. In step 404,the processing system may send a signal indicating instructing a replaydispatcher to insert a stream of replay events including a first replayevent into a production flow of a service assurance platform of atelecommunications service provider network. As discussed above, in oneexample, the production flow comprises a stream of production events,i.e., real-time original (i.e., not replay) events observed by deviceswithin the telecommunications service provider network and collected ata collection point (e.g., such as collector 116 of FIG. 1). The replaystream, on the other hand, comprises duplicates of historical (orpreviously observed) production events, which are referred to as “replayevents,” and which are to be replayed into the service assurance system(e.g., either in the production flow in accordance with the request sentin step 404 or in a separate, dedicated replay flow).

The replay events may include metadata to identify the events as replayevents rather than production events. In one example, the metadata mayinclude at least an identifier that uniquely identifies the replay eventand a timestamp that identifies a time at which the replay event wasoriginally observed to occur as a production event. The metadata mayalso include a flow type identifier that identifies the flow type of thereplay event (e.g., as either a production flow or a replay flow).

The metadata may be contained in a dedicated header to indicate that thereplay event is a replay event (e.g., coming from a replay dispatcherrather than a collector). The use of a dedicated header may facilitatethe reuse of existing collector (e.g., production) topics, e.g., so thatnew replay topics do not need to be created for each replay event to bereplayed into the service assurance system. The ability to reuseexisting production topics for replay events, rather than create newreplay topics for the replay events, may simplify maintenance operationsby reducing the total number of topics to be managed by the serviceassurance system.

In another example, the metadata may include a flag set in aconventional header to indicate that the replay event is a replay event(e.g., setting a value of the header to zero (0) may indicate aproduction event, while setting the value to one (1) may indicate areplay event).

The signal may additionally indicate where in the production flow toinsert the replay stream. As discussed above, replay events may beinserted into the production flow at any one of multiple points in theproduction flow (e.g., directly before any of the pre-processingcomponent 202, post-processing component 204, and ticketing component206 described above).

In one example, the signal may be sent via a command line or via aselection of the production flow in a drop-down menu of a graphical userinterface. For instance, the processing system may present the drop-downmenu on a display to the human technician, where the drop-down menuallows the human technician to select a destination for the stream ofreplay events. In one example, the possible destinations presented bythe GUI for selection include the production flow and a replay flow. Inone example, the replay flow, as opposed to the production flow,contains no production events. In other words, the replay flow mayexclusively contain replay events to be replayed into the serviceassurance system. In one example, the metadata of a replay event mayfurther include metadata identifying the flow into which the replayevent is to be inserted (e.g., production flow or replay flow), asdiscussed above.

In step 406, the processing system may obtain a ticket issued by aticketing component of the service assurance platform for the firstreplay event. The ticket may include the header of the first replayevent, which may allow the processing system to identify the ticket ascorresponding to the first replay event. In one example, the ticket maybe retrieved from a database that stores a plurality of tickets for aplurality of replay events (e.g., such as DB 106 of FIG. 1).

In step 408, the processing system may validate a policy associated withthe service assurance platform, based on an analysis of the ticket. Forinstance, analysis of the ticket may help the processing system todetermine whether a particular policy (e.g., a policy designed topinpoint, diagnose and/or a resolve service quality degradation ordevice malfunction in the telecommunication service provider network) iseffective.

In optional step 410, the processing system may modify the policy inresponse to the validating. For instance, the validation may determinethat the policy is not effective, or does not achieve at least somethreshold rate of efficacy. When the policy is not effective, theprocessing system may receive a signal from a human technician or policywriter to make one or more modifications to the policy, where themodifications are designed to improve the efficacy of the policy. Thepolicy, as modified, may then be tested and validated according to steps404-410 as described above (e.g., by inserting replay events into theproduction flow and analyzing the tickets resulting from replay of thereplay events).

The method 400 may end in step 412.

FIG. 5 illustrates a flowchart of an example method 500 for processingreplay events that have been inserted into production flows, inaccordance with the present disclosure. In one example, steps, functionsand/or operations of the method 500 may be performed by a device asillustrated in FIG. 1, e.g., a ticketing component of a ticketing system(e.g., ticketing component 206 of FIG. 2, and more particularly byticketing component 206 b). In one example, the steps, functions, oroperations of method 500 may be performed by a computing device orsystem 600, and/or a processing system 602 as described in connectionwith FIG. 6 below. For instance, the computing device 600 may representat least a portion of the ticketing component 206 in accordance with thepresent disclosure. For illustrative purposes, the method 500 isdescribed in greater detail below in connection with an exampleperformed by a processing system, such as processing system 602.

The method 500 begins in step 502 and proceeds to step 504. In step 504,the processing system may detect a first replay event in a productionflow being processed by a service assurance platform of atelecommunications service provider network, where the production flowcomprises a stream of production events (e.g., collected by a collectorsuch as collector 116 of FIG. 1) and a stream of replay events includingat least the first replay event (e.g., inserted into the production flowby a replay dispatcher, such as replay dispatcher 118 of FIG. 1).

In one example, the processing system may detect the first replay eventby detecting metadata associated with the first replay event thatidentifies the first replay event as a replay event rather than aproduction event. For instance, the first replay event may include adedicated header as described above. The header may include, forexample, an identifier that uniquely identifies the replay event, atimestamp that identifies a time at which the replay event wasoriginally observed to occur as a production event, and/or a flow typeidentifier identifying the type of flow (e.g., production or replay).Production events may not include this header.

In one example, the first replay event may be associated with an alarm.For instance, the first replay event may have been received directlyfrom an alarm component of the service assurance platform.

In step 506, the processing system may process the first replay event togenerate a ticket for the first replay event. As discussed above, theticket may comprise a tracking tool that contains information about thefirst replay event. The ticket may be used, for example, to prioritizealarmed replay events, organize alarmed replay events into a queue,correlate alarmed replay events, and the like.

In step 508, the processing system may deliver the ticket to a databasethat stores a plurality of tickets generated for a plurality of replayevents. The database may be accessible by a human technician or a policywriter for the service assurance platform. The policy writer may reviewthe ticket in order to determine whether the ticket indicates a need tomodify one or more policies associated with the service assuranceplatform, where the policies may be designed to pinpoint, diagnoseand/or resolve service quality degradations or device malfunctions inthe telecommunication service provider network.

The method 500 may end in step 510.

It should be noted that the methods 300, 400, and 500 may be expanded toinclude additional steps or may be modified to include additionaloperations with respect to the steps outlined above. In addition,although not specifically specified, one or more steps, functions, oroperations of the methods 300, 400, and 500 may include a storing,displaying, and/or outputting step as required for a particularapplication. In other words, any data, records, fields, and/orintermediate results discussed in the method can be stored, displayed,and/or outputted either on the device executing the method or to anotherdevice, as required for a particular application. Furthermore, steps,blocks, functions or operations in FIG. 3, FIG. 4, or FIG. 5 that recitea determining operation or involve a decision do not necessarily requirethat both branches of the determining operation be practiced. In otherwords, one of the branches of the determining operation can be deemed asan optional step. Furthermore, steps, blocks, functions or operations ofthe above described method can be combined, separated, and/or performedin a different order from that described above, without departing fromthe examples of the present disclosure.

FIG. 6 depicts a high-level block diagram of a computing device orprocessing system specifically programmed to perform the functionsdescribed herein. As depicted in FIG. 6, the processing system 600comprises one or more hardware processor elements 602 (e.g., a centralprocessing unit (CPU), a microprocessor, or a multi-core processor), amemory 604 (e.g., random access memory (RAM) and/or read only memory(ROM)), a module 605 for inserting replay events into production flows,and various input/output devices 606 (e.g., storage devices, includingbut not limited to, a tape drive, a floppy drive, a hard disk drive or acompact disk drive, a receiver, a transmitter, a speaker, a display, aspeech synthesizer, an output port, an input port and a user inputdevice (such as a keyboard, a keypad, a mouse, a microphone and thelike)). Although only one processor element is shown, it should be notedthat the computing device may employ a plurality of processor elements.Furthermore, although only one computing device is shown in the figure,if the method 300, 400, or 500 as discussed above is implemented in adistributed or parallel manner for a particular illustrative example,i.e., the steps of the above method 300, 400, or 500, or the entiremethod 300, 400, or 500, is implemented across multiple or parallelcomputing devices, e.g., a processing system, then the computing deviceof this figure is intended to represent each of those multiple computingdevices.

Furthermore, one or more hardware processors can be utilized insupporting a virtualized or shared computing environment. Thevirtualized computing environment may support one or more virtualmachines representing computers, servers, or other computing devices. Insuch virtualized virtual machines, hardware components such as hardwareprocessors and computer-readable storage devices may be virtualized orlogically represented. The hardware processor 602 can also be configuredor programmed to cause other devices to perform one or more operationsas discussed above. In other words, the hardware processor 602 may servethe function of a central controller directing other devices to performthe one or more operations as discussed above.

It should be noted that the present disclosure can be implemented insoftware and/or in a combination of software and hardware, e.g., usingapplication specific integrated circuits (ASIC), a programmable gatearray (PGA) including a Field PGA, or a state machine deployed on ahardware device, a computing device or any other hardware equivalents,e.g., computer readable instructions pertaining to the method discussedabove can be used to configure a hardware processor to perform thesteps, functions and/or operations of the above disclosed method 300,400, or 500. In one example, instructions and data for the presentmodule or process 605 for inserting replay events into production flows(e.g., a software program comprising computer-executable instructions)can be loaded into memory 604 and executed by hardware processor element602 to implement the steps, functions, or operations as discussed abovein connection with the illustrative method 300, 400, or 500.Furthermore, when a hardware processor executes instructions to perform“operations,” this could include the hardware processor performing theoperations directly and/or facilitating, directing, or cooperating withanother hardware device or component (e.g., a co-processor and the like)to perform the operations.

The processor executing the computer readable or software instructionsrelating to the above described method can be perceived as a programmedprocessor or a specialized processor. As such, the present module 605for inserting replay events into production flows (including associateddata structures) of the present disclosure can be stored on a tangibleor physical (broadly non-transitory) computer-readable storage device ormedium, e.g., volatile memory, non-volatile memory, ROM memory, RAMmemory, magnetic or optical drive, device or diskette, and the like.Furthermore, a “tangible” computer-readable storage device or mediumcomprises a physical device, a hardware device, or a device that isdiscernible by the touch. More specifically, the computer-readablestorage device may comprise any physical devices that provide theability to store information such as data and/or instructions to beaccessed by a processor or a computing device such as a computer or anapplication server.

While various examples have been described above, it should beunderstood that they have been presented by way of illustration only,and not a limitation. Thus, the breadth and scope of any aspect of thepresent disclosure should not be limited by any of the above-describedexamples, but should be defined only in accordance with the followingclaims and their equivalents.

1. A method comprising: detecting, by a processing system including atleast one processor, a first replay event in a production flow beingprocessed by a service assurance platform of a telecommunicationsservice provider network, where the production flow comprises a realtime stream of new production events and a stream of replay eventsincluding at least the first replay event, and wherein each replay eventin the stream of replay events comprises an event that was observed inthe production flow in the past and that has been duplicated andassociated with metadata prior to the each replay event being insertedinto the real time stream of new production events to produce theproduction flow; processing, by the processing system, the first replayevent to generate a ticket for the first replay event; and delivering,by the processing system, the ticket to a database that stores aplurality of tickets generated for a plurality of replay events.
 2. Themethod of claim 1, wherein production events of the real time stream ofnew production events are collected by a collector of the serviceassurance platform.
 3. The method of claim 2, wherein at least some ofthe production events are duplicated by a replay dispatcher of theservice assurance platform to produce a plurality of replay events ofthe stream of replay events.
 4. The method of claim 3, wherein thestream of replay events is inserted into the real time stream of newproduction events by the replay dispatcher to produce the productionflow.
 5. The method of claim 3, wherein the replay dispatcher associatesthe metadata with each replay event of the plurality of replay events todistinguish the each replay event from the production events.
 6. Themethod of claim 5, wherein the detecting comprises: determining, by theprocessing system, that the first replay event includes the metadata. 7.The method of claim 5, wherein the metadata is contained in a header,and wherein the production events do not include the header.
 8. Themethod of claim 7, wherein the metadata includes at least one of: aunique identifier that identifies a specific job associated with thefirst replay event, a time stamp indicating a time at which a productionevent of which the first replay event is a duplicate was observed in theproduction flow, or a flow indicator identifying the first replay eventas belonging to the real time stream of new production events or thestream of replay events.
 9. The method of claim 6, wherein theproduction flow is obtained by the processing system from apost-processing component of the service assurance platform that hasgenerated an alarm for the first replay event.
 10. The method of claim9, wherein the first replay event has been processed by a pre-processingcomponent of the service assurance system, prior to arriving at thepost-processing component, to be converted from a machine-readable formto a human-readable form.
 11. The method of claim 10, wherein themetadata is preserved by the pre-processing system and thepost-processing system prior to the detecting.
 12. The method of claim1, wherein the first replay event comprises a replay of at least one of:a simple network management protocol event, a system logging protocolevent, a polling event, or a streaming telemetry event.
 13. Anon-transitory computer-readable medium storing instructions which, whenexecuted by a processing system including at least one processor, causethe processing system to perform operations, the operations comprising:detecting a first replay event in a production flow being processed by aservice assurance platform of a telecommunications service providernetwork, where the production flow comprises a real time stream of newproduction events and a stream of replay events including at least thefirst replay event, and wherein each replay event in the stream ofreplay events comprises an event that was observed in the productionflow in the past and that has been duplicated and associated withmetadata prior to the each replay event being inserted into the realtime stream of new production events to produce the production flow;processing the first replay event to generate a ticket for the firstreplay event; and delivering the ticket to a database that stores aplurality of tickets generated for a plurality of replay events.
 14. Asystem comprising: a processing system including at least one processor;and a non-transitory computer-readable medium storing instructionswhich, when executed by the processing system, cause the processingsystem to perform operations, the operations comprising: detecting afirst replay event in a production flow being processed by a serviceassurance platform of a telecommunications service provider network,where the production flow comprises a real time stream of new productionevents and a stream of replay events including at least the first replayevent, and wherein each replay event in the stream of replay eventscomprises an event that was observed in the production flow in the pastand that has been duplicated and associated with metadata prior to theeach replay event being inserted into the real time stream of newproduction events to produce the production flow; processing the firstreplay event to generate a ticket for the first replay event; anddelivering the ticket to a database that stores a plurality of ticketsgenerated for a plurality of replay events.
 15. The system of claim 14,further comprising: a collector of the service assurance platform tocollect production events of the real time stream of new productionevents; and a replay dispatcher to duplicate at least some of theproduction events to produce a plurality of replay events of the streamof replay events and to associate the metadata with each replay event ofthe plurality of replay events.
 16. The system of claim 15, furthercomprising: a pre-processing component of the service assurance platformto convert the production events and the plurality of replay events fromcomputer-readable form to human-readable form; and a post-processingcomponent to generate one or more alarms for the production events andthe plurality of replay events.
 17. The system of claim 16, wherein thepre-processing component, the post-processing component, and a ticketingcomponent of which the processing system is a part preserve themetadata.
 18. The system of claim 15, wherein the metadata includes, forthe first replay event, at least one of: a unique identifier thatidentifies a specific job associated with the first replay event, a timestamp indicating a time at which a production event of which the firstreplay event is a duplicate was observed in the production flow, or aflow indicator identifying the first replay event as belonging to thestream of production events or the stream of replay events.
 19. Thesystem of claim 14, further comprising: a ticketing correlationcomponent to correlate tickets generated for the production events. 20.The system of claim 14, wherein the first replay event comprises areplay of at least one of: a simple network management protocol event ora system logging protocol event.